22 research outputs found
From Imitation to Prediction, Data Compression vs Recurrent Neural Networks for Natural Language Processing
In recent studies [1][13][12] Recurrent Neural Networks were used for
generative processes and their surprising performance can be explained by their
ability to create good predictions. In addition, data compression is also based
on predictions. What the problem comes down to is whether a data compressor
could be used to perform as well as recurrent neural networks in natural
language processing tasks. If this is possible,then the problem comes down to
determining if a compression algorithm is even more intelligent than a neural
network in specific tasks related to human language. In our journey we
discovered what we think is the fundamental difference between a Data
Compression Algorithm and a Recurrent Neural Network
Hash2Vec: Feature Hashing for Word Embeddings
In this paper we propose the application of feature hashing to create word embeddings for natural language processing. Feature hashing has been used successfully to create document vectors in related tasks like document classification. In this work we show that feature hashing can be applied to obtain word embeddings in linear time with the size of the data. The results show that this algorithm, that does not need training, is able to capture the semantic meaning of words.We compare the results against GloVe showing that they are similar. As far as we know this is the first application of feature hashing to the word embeddings problem and the results indicate this is a scalable technique with practical results for NLP applications.Sociedad Argentina de Informática e Investigación Operativa (SADIO
Generic LSH Families for the Angular Distance Based on Johnson-Lindenstrauss Projections and Feature Hashing LSH
In this paper we propose the creation of generic LSH families for the angular distance based on Johnson-Lindenstrauss projections. We show that feature hashing is a valid J-L projection and propose two new LSH families based on feature hashing.
These new LSH families are tested on both synthetic and real datasets with very good results and a considerable performance improvement over other LSH families. While the theoretical analysis is done for the angular distance, these families can also be used in practice for the euclidean distance with excellent results [2]. Our tests using real datasets show that the proposed LSH functions work well for the euclidean distance.Sociedad Argentina de Informática e Investigación Operativa (SADIO
Luis Argerich al Gobernador de la Provincia de Buenos Aires
Informa que de acuerdo a lo ordenado se han cargado en las carretas de Don Domingo Cruz los artÃculos de guerra librados al servicio del Ejército Auxiliar de los Andes. Hay una rúbrica de Juan Manuel de RosasCopi
Hash2Vec: Feature Hashing for Word Embeddings
In this paper we propose the application of feature hashing to create word embeddings for natural language processing. Feature hashing has been used successfully to create document vectors in related tasks like document classification. In this work we show that feature hashing can be applied to obtain word embeddings in linear time with the size of the data. The results show that this algorithm, that does not need training, is able to capture the semantic meaning of words.We compare the results against GloVe showing that they are similar. As far as we know this is the first application of feature hashing to the word embeddings problem and the results indicate this is a scalable technique with practical results for NLP applications.Sociedad Argentina de Informática e Investigación Operativa (SADIO
Lista de artÃculos de guerra para el Regimiento de los Auxiliares de los Andes
Hay una rúbrica de Juan Manuel de RosasCopi
Dispatcher3 – Machine learning for efficient flight planning: approach and challenges for data-driven prototypes in air transport
Machine learning techniques to support decisionmaking processes are in trend. These are particularly relevant in
the context of flight management where large datasets of planned and realised operations are available. Current operations experience discrepancies between planned and executed flight plan, these might be due to external factors (e.g. weather, congestion) and might lead to sub-optimal decisions (e.g. recovering delay (burning extra fuel) when no holding is expected at arrival and therefore it was no needed). Dispatcher3 produces a set of machine learning models to support flight crew pre-departure, with estimations on expected holding at arrival,
runway in use and fuel usage, and the airline’s duty manager on pre-tactical actions, with models trained with a larger look ahead time for ATFM and reactionary delay estimations. This paper describes the prototype architecture and approach of Dispatcher3 with particular focus on the challenges faced by this type of data-driven machine learning models in the field of air transport ranging: from technical aspects such as data leakage to operational requirements such as the consideration and estimation of uncertainty. These considerations should be relevant for projects which try to use machine learning in the field of aviation in general.This work is performed as part of Dispatcher3 innovation action which has received funding from the Clean Sky 2 Joint Undertaking (JU) under grant agreements No 886461. The Topic Manager is Thales AVS France SAS. The JU receives support from the European Union’s Horizon 2020 research and innovation programme and the Clean Sky 2 JU members other than the Union. The opinions expressed herein reflect the authors’ views only. Under no circumstances shall the Clean Sky 2 Joint Undertaking be responsible for any use that may be made of the information contained herein.Postprint (published version